Integrative annotation of chromatin elements from ENCODE data

نویسندگان

  • Michael M. Hoffman
  • Jason Ernst
  • Steven P. Wilder
  • Anshul Kundaje
  • Robert S. Harris
  • Max Libbrecht
  • Belinda Giardine
  • Paul M. Ellenbogen
  • Jeffrey A. Bilmes
  • Ewan Birney
  • Ross C. Hardison
  • Ian Dunham
  • Manolis Kellis
  • William Stafford Noble
چکیده

The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative toward the annotation of regulatory elements, their interrelations contain much richer information for the systematic annotation of regulatory elements. To uncover these interrelations and to generate an interpretable summary of the massive datasets of the ENCODE Project, we apply unsupervised learning methodologies, converting dozens of chromatin datasets into discrete annotation maps of regulatory regions and other chromatin elements across the human genome. These methods rediscover and summarize diverse aspects of chromatin architecture, elucidate the interplay between chromatin activity and RNA transcription, and reveal that a large proportion of the genome lies in a quiescent state, even across multiple cell types. The resulting annotation of non-coding regulatory elements correlate strongly with mammalian evolutionary constraint, and provide an unbiased approach for evaluating metrics of evolutionary constraint in human. Lastly, we use the regulatory annotations to revisit previously uncharacterized disease-associated loci, resulting in focused, testable hypotheses through the lens of the chromatin landscape.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Op-nare120972 827..841

The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative toward the annotation of regulatory elements, their interrelations contain much richer information for the systematic annotation of regulatory elements. To uncover these interrelations and to generate a...

متن کامل

Identification and annotation of non-protein coding RNAs

Of the 3.3 billion bases of the human genome, only about 2% code for proteins. Since very recently, the remaining 98% have been considered to be ’junk’ and functionless. However, large transcriptomic studies like ENCODE (ENCyclopedia Of DNA Elements) (1) or FANTOM (The Functional Annotation Of the Mammalian Genome) (2) have shown that around 90% of the genome is actively transcribed into RNA. T...

متن کامل

Accurate Promoter and Enhancer Identification in 127 ENCODE and Roadmap Epigenomics Cell Types and Tissues by GenoSTAN

Accurate maps of promoters and enhancers are required for understanding transcriptional regulation. Promoters and enhancers are usually mapped by integration of chromatin assays charting histone modifications, DNA accessibility, and transcription factor binding. However, current algorithms are limited by unrealistic data distribution assumptions. Here we propose GenoSTAN (Genomic STate ANnotati...

متن کامل

Spectacle: Faster and more accurate chromatin state annotation using spectral learning

Recently, a wealth of epigenomic data has been generated by biochemical assays and next-generation sequencing (NGS) technologies. In particular, histone modification data generated by the ENCODE project and other large-scale projects show specific patterns associated with regulatory elements in the human genome. It is important to build a unified statistical model to decipher the patterns of mu...

متن کامل

Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data

Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping suc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2013